14 research outputs found
Embeddings for word sense disambiguation: an evaluation study
Recent years have seen a dramatic growth in the popularity of word embeddings mainly owing to their ability to capture semantic information from massive amounts of textual content. As a result, many tasks in Natural Language Processing have tried to take advantage of the potential of these distributional models. In this work, we study how word embeddings can be used in Word Sense Disambiguation, one of the oldest tasks in Natural Language Processing and Artificial Intelligence. We propose different methods through which word embeddings can be leveraged in a state-of-the-art supervised WSD system architecture, and perform a deep analysis of how different parameters affect performance. We show how a WSD system that makes use of word embeddings alone, if designed properly, can provide significant performance improvement over a state-of-the-art WSD system that incorporates several standard WSD features
SensEmbed: Learning sense embeddings for word and relational similarity
Word embeddings have recently gained considerable popularity for modeling words in different Natural Language Processing (NLP) tasks including semantic similarity measurement. However, notwithstanding their success, word embeddings are by their very nature unable to capture polysemy, as different meanings of a word are conflated into a single representation. In addition, their learning process usually relies on massive corpora only, preventing them from taking advantage of structured knowledge. We address both issues by proposing a multifaceted approach that transforms word embeddings to the sense level and leverages knowledge from a large semantic network for effective semantic similarity measurement. We evaluate our approach on word similarity and relational similarity frameworks, reporting state-of-the-art performance on multiple datasets
Embedding Words and Senses Together via Joint Knowledge-Enhanced Training
Word embeddings are widely used in Nat-ural Language Processing, mainly due totheir success in capturing semantic infor-mation from massive corpora. However,their creation process does not allow thedifferent meanings of a word to be auto-matically separated, as it conflates theminto a single vector. We address this issueby proposing a new model which learnsword and sense embeddings jointly. Ourmodel exploits large corpora and knowl-edge from semantic networks in order toproduce a unified vector space of wordand sense embeddings. We evaluate themain features of our approach both qual-itatively and quantitatively in a variety oftasks, highlighting the advantages of theproposed method in comparison to state-of-the-art word- and sense-based models
Topic-Aware Response Generation in Task-Oriented Dialogue with Unstructured Knowledge Access
To alleviate the problem of structured databases' limited coverage, recent
task-oriented dialogue systems incorporate external unstructured knowledge to
guide the generation of system responses. However, these usually use word or
sentence level similarities to detect the relevant knowledge context, which
only partially capture the topical level relevance. In this paper, we examine
how to better integrate topical information in knowledge grounded task-oriented
dialogue and propose ``Topic-Aware Response Generation'' (TARG), an end-to-end
response generation model. TARG incorporates multiple topic-aware attention
mechanisms to derive the importance weighting scheme over dialogue utterances
and external knowledge sources towards a better understanding of the dialogue
history. Experimental results indicate that TARG achieves state-of-the-art
performance in knowledge selection and response generation, outperforming
previous state-of-the-art by 3.2, 3.6, and 4.2 points in EM, F1 and BLEU-4
respectively on Doc2Dial, and performing comparably with previous work on
DSTC9; both being knowledge-grounded task-oriented dialogue datasets.Comment: Findings of EMNLP 202
Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation
Determining the plausibility of causal relations between clauses is a
commonsense reasoning task that requires complex inference ability. The general
approach to this task is to train a large pretrained language model on a
specific dataset. However, the available training data for the task is often
scarce, which leads to instability of model training or reliance on the shallow
features of the dataset. This paper presents a number of techniques for making
models more robust in the domain of causal reasoning. Firstly, we perform
adversarial training by generating perturbed inputs through synonym
substitution. Secondly, based on a linguistic theory of discourse connectives,
we perform data augmentation using a discourse parser for detecting causally
linked clauses in large text, and a generative language model for generating
distractors. Both methods boost model performance on the Choice of Plausible
Alternatives (COPA) dataset, as well as on a Balanced COPA dataset, which is a
modified version of the original data that has been developed to avoid
superficial cues, leading to a more challenging benchmark. We show a
statistically significant improvement in performance and robustness on both
datasets, even with only a small number of additionally generated data points.Comment: 7 pages + pages references, 4 figures, 3 tables, paper accepted at
AAAI202
The Regular Expression Inference Challenge
We propose \emph{regular expression inference (REI)} as a challenge for
code/language modelling, and the wider machine learning community. REI is a
supervised machine learning (ML) and program synthesis task, and poses the
problem of finding minimal regular expressions from examples: Given two finite
sets of strings and and a cost function , the task
is to generate an expression that accepts all strings in and rejects
all strings in , while no other such expression exists with
.
REI has advantages as a challenge problem: (i) regular expressions are
well-known, widely used, and a natural idealisation of code; (ii) REI's
asymptotic worst-case complexity is well understood; (iii) REI has a small
number of easy to understand parameters (e.g.~ or cardinality, string
lengths of examples, or the cost function); this lets us easily finetune
REI-hardness; (iv) REI is an unsolved problem for deep learning based ML.
Recently, an REI solver was implemented on GPUs, using program synthesis
techniques. This enabled, for the first time, fast generation of minimal
expressions for complex REI instances. Building on this advance, we generate
and publish the first large-scale datasets for REI, and devise and evaluate
several initial heuristic and machine learning baselines.
We invite the community to participate and explore ML methods that learn to
solve REI problems. We believe that progress in REI directly translates to
code/language modelling.Comment: 7 pages, 3 pages appendix, 6 table
Retrieving Multi-Entity Associations: An Evaluation of Combination Modes for Word Embeddings
Word embeddings have gained significant attention as learnable
representations of semantic relations between words, and have been shown to
improve upon the results of traditional word representations. However, little
effort has been devoted to using embeddings for the retrieval of entity
associations beyond pairwise relations. In this paper, we use popular embedding
methods to train vector representations of an entity-annotated news corpus, and
evaluate their performance for the task of predicting entity participation in
news events versus a traditional word cooccurrence network as a baseline. To
support queries for events with multiple participating entities, we test a
number of combination modes for the embedding vectors. While we find that even
the best combination modes for word embeddings do not quite reach the
performance of the full cooccurrence network, especially for rare entities, we
observe that different embedding methods model different types of relations,
thereby indicating the potential for ensemble methods.Comment: 4 pages; Accepted at SIGIR'1